Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing
نویسندگان
چکیده
In distributed stream processing environments, large numbers of continuous queries are distributed onto multiple servers. When one or more of these servers become overloaded due to bursty data arrival, excessive load needs to be shed in order to preserve low latency for the query results. Because of the load dependencies among the servers, load shedding decisions on these servers must be well-coordinated to achieve end-to-end control on the output quality. In this paper, we model the distributed load shedding problem as a linear optimization problem, for which we propose two alternative solution approaches: a solver-based centralized approach, and a distributed approach based on metadata aggregation and propagation, whose centralized implementation is also available. Both of our solutions are based on generating a series of load shedding plans in advance, to be used under certain input load conditions. We have implemented our techniques as part of the Borealis distributed stream processing system. We present experimental results from our prototype implementation showing the performance of these techniques under different input and query workloads.
منابع مشابه
Content-based Load Shedding in Multimedia Data Stream Management System
Overload management has become very important in public safety systems that analyse high performance multimedia data streams, especially in the case of detection of terrorist and criminal dangers. Efficient overload management improves the accuracy of automatic identification of persons suspected of terrorist or criminal activity without requiring interaction with them. We argue that in order t...
متن کاملA Framework For Supporting Load Shedding in Data Stream Management Systems
The arrival rate of tuples in a data stream can be unpredictable and bursty. Many stream-based applications have Quality of Service (QoS) requirements that need to be satisfied by the underlying stream processing system. In order to avoid violating predefined QoS requirements during temporary overload periods, a load shedding strategy is necessary and critical for a data stream management syste...
متن کاملHow to Screen a Data Stream - Quality-Driven Load Shedding in Sensor Data Streams
As most data stream sources exhibit bursty data rates, data stream management systems must recurrently cope with load spikes that exceed the average workload to a considerable degree. To guarantee low-latency processing results, load has to be shed from the stream, when data rates overstress system resources. There exist numerous load shedding strategies to delete excess data. However, the cons...
متن کاملLoad Management and High Availability in the Borealis Distributed Stream Processing Engine
Borealis is a distributed stream processing engine that has been developed at Brandeis University, Brown University, and MIT. It extends the first generation of data stream processing systems with advanced capabilities such as distributed operation, scalability with timevarying load, high availability against failures, and dynamic data and query modifications. In this paper, we focus on aspects...
متن کاملSOSA: A Safe Load Shedding Approach for Monitoring Data Streams in Real-Time
Real-time stream processing is essential for many real-life stream-based applications. Systems designed to run such applications must be prepared to operate under overloaded conditions. Existing load shedding techniques are not suitable for processing data streams with stringent timing constraints because their tuple dropping policies may violate application deadlines in an uncontrolled way. To...
متن کامل